Search CORE

Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies.

Author: A Myers
A Nica
A Price
BE Stranger
C Lippert
D Balding
D Locke
E Schadt
EN Smith
G Churchill
H Kang
HM Kang
HM Kang
J Listgarten
J Pickrell
J Yu
JT Leek
Matthew Stephens
MC Teixeira
MI McCarthy
Neil D. Lawrence
Nicoló Fusi
O Stegle
O Stegle
Oliver Stegle
R Breitling
RB Brem
V Plagnol
WE Johnson
X Gan
Publication venue: PLoS Comput Biol
Publication date: 01/01/2012
Field of study

Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown subtle environmental perturbations. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, this new model can more accurately distinguish true genetic association signals from confounding variation. We applied our model and compared it to existing methods on different datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, our approach not only identifies a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. A software implementation of PANAMA is freely available online at http://ml.sheffield.ac.uk/qtl/

Publikationsserver der Universität Tübingen

Apollo (Cambridge)

White Rose Research Online

FigShare

Joint Genetic Analysis of Gene Expression Data with Inferred Cellular Phenotypes

Author: A Komeili
B Stranger
BEE Stranger
C O'Conalláin
D Wykoff
E Chaibub Neto
EE Schadt
EJ Foss
EN Smith
EO Perlstein
G Gibson
G Yvert
HJ Cordell
J Aten
J Leek
J Smith
J Storey
J Zhu
JC Liao
JD Storey
JN Hirschhorn
John D. Storey
John Winn
Leopold Parts
M Costanzo
M Jordan
M Kanehisa
M Morley
M Rattray
MC Teixeira
ML Martin-Magniette
O Alter
O Stegle
O Stegle
O Stegle
Oliver Stegle
PY Lum
R Brem
R McCord
RB Brem
Richard Durbin
S Biswas
S Gygi
S Lee
SB Montgomery
TFC Mackay
TP Minka
W Goerner
W Sun
W Zhang
W Zou
Y Chen
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Even within a defined cell type, the expression level of a gene differs in individual samples. The effects of genotype, measured factors such as environmental conditions, and their interactions have been explored in recent studies. Methods have also been developed to identify unmeasured intermediate factors that coherently influence transcript levels of multiple genes. Here, we show how to bring these two approaches together and analyse genetic effects in the context of inferred determinants of gene expression. We use a sparse factor analysis model to infer hidden factors, which we treat as intermediate cellular phenotypes that in turn affect gene expression in a yeast dataset. We find that the inferred phenotypes are associated with locus genotypes and environmental conditions and can explain genetic associations to genes in trans. For the first time, we consider and find interactions between genotype and intermediate phenotypes inferred from gene expression levels, complementing and extending established results

Genome-Scale Oscillations in DNA Methylation during Exit from Pluripotency

Author: Angermueller C.
Clark S.
Dean W.
Kelsey G.
Krueger F.
Lee H.
Mohammed H.
Nichols J.
Reik W.
Rugg-Gunn P.
Rulands S.
Simons B.
Smallwood S.
Stegle O.
Publication venue: 'Elsevier BV'
Publication date: 25/07/2018
Field of study

Pluripotency is accompanied by the erasure of parental epigenetic memory, with naive pluripotent cells exhibiting global DNA hypomethylation both in vitro and in vivo. Exit from pluripotency and priming for differentiation into somatic lineages is associated with genome-wide de novo DNA methylation. We show that during this phase, co-expression of enzymes required for DNA methylation turnover, DNMT3s and TETs, promotes cell-to-cell variability in this epigenetic mark. Using a combination of single- cell sequencing and quantitative biophysical modeling, we show that this variability is associated with coherent, genome-scale oscillations in DNA methylation with an amplitude dependent on CpG density. Analysis of parallel single-cell transcriptional and epigenetic profiling provides evidence for oscillatory dynamics both in vitro and in vivo. These observations provide insights into the emergence of epigenetic heterogeneity during early embryo development, indicating that dynamic changes in DNA methylation might influence early cell fate decisions

Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm

Author: A Schliep
David L. Wild
E Cooke
Emma J. Cooke
G Brock
K Heller
L Bauwens
L Hubert
M Eisen
Magnus Rattray
NA Heard
NA Heard
O Stegle
P Ma
Paul D. W. Kirk
PDW Kirk
Q Liu
R Cho
Richard S. Savage
Robert Darkins
RS Savage
RS Savage
S Datta
S Frühwirth-Schnatter
W Chu
Z Bar-Joseph
Z Bar-Joseph
Zoubin Ghahramani
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 02/04/2013
Field of study

We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/

Warwick Research Archives Portal Repository

A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies

Author: AL Price
Aviv Regev
B Stranger
BEE Stranger
D Balding
DJC MacKay
DJC Mackay
E Lander
EE Schadt
EN Smith
G Gibson
HM Kang
J Reimand
J Winn
John Winn
JT Leek
Leopold Parts
M Jordan
M Rattray
O Stegle
Oliver Stegle
RB Brem
RB Brem
RB Williams
Richard Durbin
RM Neal
RSS Spielman
S Biswas
T Barrett
T Pastinen
V Emilsson
V Plagnol
Y Chen
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Gene expression measurements are influenced by a wide range of factors, such as the state of the cell, experimental conditions and variants in the sequence of regulatory regions. To understand the effect of a variable of interest, such as the genotype of a locus, it is important to account for variation that is due to confounding causes. Here, we present VBQTL, a probabilistic approach for mapping expression quantitative trait loci (eQTLs) that jointly models contributions from genotype as well as known and hidden confounding factors. VBQTL is implemented within an efficient and flexible inference framework, making it fast and tractable on large-scale problems. We compare the performance of VBQTL with alternative methods for dealing with confounding variability on eQTL mapping datasets from simulations, yeast, mouse, and human. Employing Bayesian complexity control and joint modelling is shown to result in more precise estimates of the contribution of different confounding factors resulting in additional associations to measured transcript levels compared to alternative approaches. We present a threefold larger collection of cis eQTLs than previously found in a whole-genome eQTL scan of an outbred human population. Altogether, 27% of the tested probes show a significant genetic association in cis, and we validate that the additional eQTLs are likely to be real by replicating them in different sets of individuals. Our method is the next step in the analysis of high-dimensional phenotype data, and its application has revealed insights into genetic regulation of gene expression by demonstrating more abundant cis-acting eQTLs in human than previously shown. Our software is freely available online at http://www.sanger.ac.uk/resources/software/peer/

CiteSeerX

Queen Mary Research Online

Population-scale proteome variation in human induced pluripotent stem cells

Author: Agu CA
Alderton A
Beales P
Bensaddek D
Birney E
Bonder MJ
Brenes A
Casale FP
Clarke L
Danecek P
Denovi D
Denton R
Durbin R
Gaffney DJ
Goncalves A
Halai R
Harper S
Harrison PW
HipSci Consortium
Kilpinen H
Kilpinen H
Kirton CM
Kolb-Kokocinski A
Lamond AI
Lamond AI
Leha A
McCarthy SA
Meleckyte R
Memari Y
Mirauta BA
Moens N
Ouwehand WH
Patel M
Seaton DD
Stegle O
Stegle O
Streeter I
Watt FM
Publication venue
Publication date: 10/08/2020
Field of study

Human disease phenotypes are driven primarily by alterations in protein expression and/or function. To date, relatively little is known about the variability of the human proteome in populations and how this relates to variability in mRNA expression and to disease loci. Here, we present the first comprehensive proteomic analysis of human induced pluripotent stem cells (iPSC), a key cell type for disease modelling, analysing 202 iPSC lines derived from 151 donors, with integrated transcriptome and genomic sequence data from the same lines. We characterised the major genetic and non-genetic determinants of proteome variation across iPSC lines and assessed key regulatory mechanisms affecting variation in protein abundance. We identified 654 protein quantitative trait loci (pQTLs) in iPSCs, including disease-linked variants in protein-coding sequences and variants with trans regulatory effects. These include pQTL linked to GWAS variants that cannot be detected at the mRNA level, highlighting the utility of dissecting pQTL at peptide level resolution

UCL Discovery

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics

Author: Aivazidis A
Arutyunyan A
Bayraktar OA
Dann E
Gerstung M
Jain MS
James L
Kedlian V
King HW
Kleshchevnikov V
Li T
Lomakin A
Park JS
Ramona L
Shmatko A
Stegle O
Tuck E
Vento-Tormo R
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2020
Field of study

elocation-id: 2020.11.15.378125elocation-id: 2020.11.15.378125The spatial organization of cell types in tissues fundamentally shapes cellular interactions and function, but the high-throughput spatial mapping of complex tissues remains a challenge. We present сell2location, a principled and versatile Bayesian model that integrates single-cell and spatial transcriptomics to map cell types in situ in a comprehensive manner. We show that сell2location outperforms existing tools in accuracy and comprehensiveness and we demonstrate its utility by mapping two complex tissues. In the mouse brain, we use a new paired single nucleus and spatial RNA-sequencing dataset to map dozens of cell types and identify tissue regions in an automated manner. We discover novel regional astrocyte subtypes including fine subpopulations in the thalamus and hypothalamus. In the human lymph node, we resolve spatially interlaced immune cell states and identify co-located groups of cells underlying tissue organisation. We spatially map a rare pre-germinal centre B-cell population and predict putative cellular interactions relevant to the interferon response. Collectively our results demonstrate how сell2location can serve as a versatile first-line analysis tool to map tissue architectures in a high-throughput manner.Competing Interest StatementThe authors have declared no competing interest

Patterns of Cis Regulatory Variation in Diverse Human Populations

The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations. We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations

CiteSeerX

Harvard University - DASH

Publikationsserver der Universität Tübingen